Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing
نویسندگان
چکیده
This paper proposes a method to improve shift-reduce constituency parsing by using lexical dependencies. The lexical dependency information is obtained from a large amount of auto-parsed data that is generated by a baseline shift-reduce parser on unlabeled data. We then incorporate a set of novel features defined on this information into the shift-reduce parsing model. The features can help to disambiguate action conflicts during decoding. Experimental results show that the new features achieve absolute improvements over a strong baseline by 0.9% and 1.1% on English and Chinese respectively. Moreover, the improved parser outperforms all previously reported shift-reduce constituency parsers. Title and Abstract in Chinese 利用大规模数据词汇依存关系改进移进-归约成分句法分析 本文提出了一种利用词汇依存关系改进移进-归约成分句法分析的方法。首先,我们利用 基准系统在大规模无标注数据上进行自动句法分析并从分析结果中抽取词汇依存关系。其 后,我们在词汇依存信息的基础上定义了一组新特征并将这些特征整合到移进-归约句法 分析模型 中。新特征用于帮助消除移进-归约过程中的动作歧义。实验结果表明,新特征 在英文和中文数据上分别取得了0.9% 和1.1%的性能改进。最终得到的句法分析器的性能 优于相关研究工作中所报告的移进-归约句法分析器的性能。
منابع مشابه
Improving shift-reduce constituency parsing with large-scale unlabeled data
Shift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Procee...
متن کاملDecreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities
In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for mapping words to entities, and look at the effec...
متن کاملPartial Training for a Lexicalized-Grammar Parser
We propose a solution to the annotation bottleneck for statistical parsing, by exploiting the lexicalized nature of Combinatory Categorial Grammar (CCG). The parsing model uses predicate-argument dependencies for training, which are derived from sequences of CCG lexical categories rather than full derivations. A simple method is used for extracting dependencies from lexical category sequences, ...
متن کاملDiscontinuous parsing with continuous trees
We introduce a new method for incremental shift-reduce parsing of discontinuous constituency trees, based on the fact that discontinuous trees can be transformed into continuous trees by changing the order of the terminal nodes. It allows for a clean formulation of different oracles, leads to faster parsers and provides better results. Our best system achieves an F1 of 80.02 on TIGER.
متن کاملUptraining for Accurate Deterministic Question Parsing
It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of th...
متن کامل